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VTOEO INDEXING AND IMAGE RETRIEVAL SYSTEM 

Tlte present inventioo relates generally to video signal processing. More 
parUcularly, the mvention relates to a video indexing and image retrieval system. 

over the bst fev. years, there has been a dramattc increase in available 
5 bandwid* over high-speed networks. A, the same time, conrptuer manul^cturers 
ta^^ved the storage capacities of hard drives on personal computers, and intproved 
the speed of the system bus ^ .he motherboards that access the hard drives. The 
^nality and efftciency of data compression algorithms has likewise ^proved 
infonnaUon transmission efliciency and access ra^ - particularly with respect to video 



10 data 



one of are most important tasks of a database manager is to provide easy and 
intuitive access ,0 data. This task can be pardcnlarly di«icu,t when a user would Uke 
.o search for images or other vUua, data such as video segments. Browshrg and 

15 internet pages rapidly and inWidvely dtrough text-based searches. 

TO allow me searching of visn^ data, an hnage re^ieval system must be able to 
en^hasis flte similarity of a ,uery hnage with images or frames of video stored in a 
<^se. There are several ways that a user may provide a ,uery image. For 
example, users may have a rough idea of an image that they are looking for. The user 
20 may develop a shnple sketch of an image by hand and a scanner can be used to upload 

can be used to find other similar images in the database. 



An image search engine m„s. be able «. general a measurement o, *e 
.rarity between .he ,uery image an* aa*ase images so *a. -he ..r is presenreci 
.ith a US. o, .he mos. reievan. da.abase images «> .he leas, reievan. images. The 

5 ,„r similariUes be^een significan. fean.es of *e ,uery s.e.ch or hnage ^ «.e 
<.a.base images whUe ignoring minor de.aU variaUons. oU.er words. *e nnage 

database images invariantly. 

Wften searching *e video secpences, i. would be inefficient for dte image 
,0 reMeva, sys.em <o compare dte ,«ery image to every ftame of U.e video set^uce. A 

ft^es drat are .a.en by one camera wiftout intermpUon. To avoid *e h^fficiency. 
.he image da*ase manager must .a.e dre dme .0 segment d,e shots and identic a Key 
ftame to represent d,e shots. To simplify this problem, i, is desirable to perform video 
,5 segmenaUon and key frame identification automatically. 

A first step towards au.omatic video mdexing is dte abUity to identify bo* 
abrupt transitions and gradual transitions. An abrupt transition is a d.continuous 
„ansition between two images and is also referred to as a cut transition. Gradual 
^anshions htclude fade, dissolve, and wipe transitions. When an h«,ge gradually 
,0 disappears into b.ck or whi.e or gradually appears from black or «hi.e, a fade 
„a„si.ion occurs. When an image gr^ually disappears at Ute same .hne anodter 

hlocks a second hnage, a wipe tiansition «xurs. Gradual tiansitions are compos,.e 
Shots that are created from more than one shot. 



A shot trailsWon detector must be sensitive to both .brupt transitions and 
gradual transitions to be successful in an automatic video indexmg system. The shot 
tramition detector should also be insensitive to other changes. In o^er words, the shot 
transition detector should ignore small detail changes, image motion a«l camera 
5 motion. For example, panning, zooming and tilting should not signiflca^ly impact the 
query results. 

Conventional video retrieval systems have employed several different types of 
cut transttion detection techniques including histogram difference, frame difference, 
motion vector amdysis, compression difference, and neural-network approaches. 
10 Frame differencing detection systems are extremely sensitive to local motion. 
Histogram detection systems successfuUy identify abrupt shot transitions and fades, but 
work poorly on gr^Jual transition such as wipe ^ dissolve. Motion vector deteaion 
systems require extensive compuutions tirat are prohibitive when large image 
databases are used. Neural-network detection systems do not provide unproved 
15 performance over the other cu, detection systems. Neural-network detection systems 
require significant computations for the neural-network training process. 

Additional algorithms have been proposed drat address gradual transitions such 
as fede. dissolve, and wipe transitions. Edge tiacking systems measure *e relative 
values of cutting and exiting edge percentages. Edge trackmg systems are able to 
20 correctly identify less than 20% of gradual transitions. Edge tracking systems require 
a motion estimation step to align consecutive frames which is conrputationally 
expensive. The performance of tire edge trackmg system is highly dependent upon the 
accuracy of die motion estimation step. Chromatic scaling systems assume Utat fade m 
uansitions and fade out transitions are to and from black only. Chromatic scaling 



sys^ms also U>a. bo* objea and c=n,.ra moUon are ,ow i™media»,y before 

and after the transition period. 

A video segmemanon system according .o Che invention includes a video source 
^ provides a video sequence with a plurality of frames. The video segmentation 
5 system generates an S-distance measurement betwe«, adjac^t frames of the video 
^ence. The S-distance measurement gauges the smularity between the adja^t 
frames. 

A frequency decomposer that preferably employs wavelet decomposition 
generates a low fteqnency and a high frequency signature for each frame. A cut 
,0 detector idenUfres cut transitions between two adjacent frames ushtg the low frequency 
signature. A cut detector generates a difference signal between coeffrcients of me low 
frequency sig^re for adjacent ftantes and compares the difference signal to a 
threshold. H the difference signal exceeds the threshold, a cut transition is declared. 

After identitymg the cut transitions. *e video segmentation system accordmg 
, the invention employs a fade detector d»t identifies Me fransitions usmg the high 
frequency signatures for frames located between the cut fransiUons. The f.de detector 
includes a summing signal generator that sums the coeffrcients of the high frequency 
signamre for each frame and compares the sum signal to a Imear signal which is an 
increashtg htnCion for fade in and a decreasing frtnction for fade out. A dUsolve 
20 transition detector employs the high frequency signature to idenUfy potential dissolve 
transihons. A double frante difference generator confirms the dissolve transitions. As 
can be appreciated, video se^entation system according to the htvention 
dramatically hnproves the identification of abrupt a.^ gr^ua. transitions. The vid^ 
segmentation system achieves segmentation in a computationally eff-,cien, manner. 
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An image retrieval system according to the invention also employs an S- 
distance measurement to compare a query image to images located in a database. ^ 
S-distance measurement is used to allow a user to search and browse in a manner 
similar to text-based systems provided by the Internet. 

5 ppTPF nRSCRIPTION T^wF r>R AWINGS 

FIG. 1 illustrates a video sequence that includes a plurality of shots; 
FIG. 2 illustrates multiple frames of a video sequence that are associated with a 
cut transition; 

FIG. 3 illustrates multiple frames of a video sequence that are associated with a 

10 fade transition; 

HG. 4 illustrates multiple frames of a video sequence that are associated with a 

dissolve transition; 

FIG. 5 is a functional block diagram of an automatic video indexing system 

according to the invention; 
15 FIG. 6 is a functional block diagram Ulustrating a cut transition detector of 

FIG. 5 in further detail; 

HG. 7 is a flow chart diagram for the cut transition detector of FIG. 6; 

FIG. 8 illustrates a fade detector of FIG. 5 in fiirther detaU; 

FIG. 9 is a flow chart diagram illustrating fade transition detection for the fade 

20 transition detector of FIG. 8; 

FIG. 10 is a functional block diagram illustrating a dissolve transition segments 

collector of FIG. 5 in further detaU; 
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FIG. 11 is a flow chart diagram illustrating the operation of the dissolve 

transition verifier; 

HG. 12 is a functional block diagram of one embodiment of an image retrieval 

system; and 

5 FIG. 13 is a functional block diagram of a second embodiment of the image 

retrieval system. 

ppxATT nP^ri^TPTIO N thf PRRFERPFD FMBODIMENTS 
Referring to FIG. 1, a video sequence 10 is illustrated and includes a plurality 
of shots 12-1 to 12-n each including one or more frames 16-1 to 16-m. The video 
10 sequence 10 includes n shots and m frames. An automatic video indexing system 
according to the invention is preferably capable of identifying both abrupt and gradual 
transitions between the n shots 12. After identifying the transitions between the n 
shots, a key frame can be selected for each shot 12 for video indexing, retrieval and 
other uses. The key frame can be a first frame in the shot, a middle frame or a 
15 combination of frames. Not all transitions between the « shots are easy to identify. 
For example, a transition between shot n-l and shot « is a cut transition 20. A 
transition between shot 1 and shot 2 is a dissolve transition 22. 

FIGS. 2-4 illustrate frames associated with both abrupt and gradual shot 
transitions. Referring now to FIG. 2, a first frame 30 of a video sequence starting at 
20 time t is followed by a second frame 32 starting at time r-Hi. Because of the abrupt 
transition between the frames 30 and 32, a cut fransition is designated at time t^l 

(identified at 34) in FIG. 2. 

FIG. 3 illustrates n frames of a fade out transition 40. Frame 44 occurs at time 



, and an image is readUy distinguishable. Frame 46 occurs at the time r+i and the 
visibility of image 42' is somewhat reduced relative to the image 42 in frame 44. At 
time the visibility of the image 42- is further reduced until at time t+n, the 

image 42 generally disappears into a single color, such as black or white. A fade in 
5 transition would be accomplished in reverse. 

Referring now to FIG. 4, a dissolve transition 54 is illustrated and includes n 
frames. At thne r, the visibility of an image 56 in frame 58 is relatively high. At time 
t^l, the visibUity of the image 56' in frame 60 is reduced and a second image 62 
becomes visible and has a relatively low visibility. At time the visibility of 

10 image 56- in frame 66 has decreased and the visibiUty of the image 62' has increased. 
At time r+n, the image 56 in frame 70 has disappeared and the visibility of the image 
62" has increased. 

Referring now to FIG. 5, an automatic video indexing system 80 for detecting 
abrupt and gradual transitions is Ulustrated. The automatic video indexing system 80 
15 includes a processor 84 that is connected to memory 86 and an input/output interface 
90. The memory 86 includes read only memory (ROM), random access memory 
(RAM), optical storage, hard drives, and/or o^er suitable storage. The automatic 
video indexing system 80 includes a source of video sequences such as a local video 
(hnage) database 92 or distributed video (image) databases such as a video (image) 
20 database 94 that is available through a local area network (LAN) 96 or a video (image) 
database 98 that is available through a wide area network (WAN) 100 which can be 
connected to the Internet. The automatic video indexmg system 80 also includes 
input/output (I/O) devices 104 such as a keyboard, a mouse, one or more displays, an 
image scanner, a printer, and/or other I/O devices. 



A video sequence se.ec»r 110 allows the use, to selec, one or more video 
sequences 10 d,at may be stored in d,e video (image) databases 92. 94 and/or 98. 
Video sequence selection can be performed in a conventional manner ««>ugh dialog 
boxes. Which are navigated using mouse and/or keyboard selections. An image 
5 extrac«>r 114 extracts a thumbnail direct current (DC) image for each ftame of a 
Elected video sequence. A frequency decomposer 116 is connected to the image 
extractor 114 and generates a ftequency domain decomposition of each thumbnail DC 
image. The frequency decomposer can employ fast Fourier transform (FFT), discrete 
cosign transform (DOT), dis^ete Fourier transform (DFT), or wavelet transformation. 
10 Due to the con^utational eMciency of wavelet transforms such as Haar wavelet 
transforms, wavelet transformation is preferred. 

When Motion Picmre Expem Group (MPEG) video sequence sources are 
employed, they typicaUy have a f.ame size of 512 by 512 pixels or iarger. Generally 
„e DC image is generated for 8 by 8 pixel bloc^. A typical drumbnaU MPEG image 
,5 is 512/8 by 512/8 or 64 by 64 pixel blocfc or larger. Decomposition of the MPEG 
tt,umbnaU ftame images ustag wavelet transfonns has been found to be a sufficien. 
input for generattag high and low frequency domain components. This technique is 
advantageous since the dtumbnaU DC-image is much easier to extra« ftom the MPEG 
video as compared to AC coeMcients. By employing only the thumbnail DC 
20 components, lower computational time is required. 

A low frequency component (LFC) signamre generator 120 generates a LFC 
signamre for each image. A high frequency component (HFO signature generator 124 
generates a HFC signamre. In one embodiment, wavelet transformation is employed 
to generate the LFC and HFC signamres. The HFC ard LFC signamres are generated 
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as follows: 

F is a representation of the host content ^ F'is a function of F after processing 
such as compression or blurring. If automatic video indexing and image retrieval are 
desired, then the following need to be identified. 

— = 0 or F(t+1)-F(t)=0 for video segmentation; and 
dt 

f (Q) - F(5„) = 0 for retrieval of query image Q from the database containing 

images 5i, ... 5n... ^- 

~5 represents a wavelet transform of J. 5' is the image ^in the wavelet 
domain with the small coefficients set to zero. Studies on visual data compression 
10 indicate that visually when the small coefficients of the wavelet 

transformation of image ^are discarded. SupposeF(5') is a feature extracted from 
5 'that invariantly preserves the visual content of ^ Then visually we have 
F(5')-F(5)->0. F can therefore be used as a discriminant function such that 
- > F(5:) - - where 5a and 5. are two different images 

15 (fi-ames) that are visually different 

For video segmentation and image retrieval, the overall content change 
between two images or two video frames needs to be measured. The measurement 
should reflect the overall structures of two images or frames. S-distance is a 
measurement of the distance between two images or frames in the wavelet domain. 
20 S-distance gives a measurement of how many significant LFCs and/or HFCs of two 
images are in common. As a result, S-distance provides a good measurement on the 
overall similarities and/or differences between two images or frames and can be 
used for browsing and searching images as will be described further below. 

9 
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V, it e[0.n]) represents a frame t of video sequence V and J? represents an 
image, v^andv^ represent shot 1 and shot 2. The image size is Xby Y. I,(x,y) is 
the intensity of the (x,y)th coefficient of frame t and I(x.y) to be the intensity of 
(x.y)th coefficient of image / where, x &\\,X] , and y e[\,r\ . 

To define S-distance, wavelet transformation is performed on two images. 
For example, the two images can be the query and the target images for image 
retrieval or two consecutive frames for video segmentation. The wavelet coefficient 
of image L(x,y) is denoted as I,ix,y). The LFC signature Sl and the HFC signature 
Sh of each frame/image are defined as follows: 

^j(7,(o,o)) ^(7,(1,0)) 

Sl(L) =(^il](x,y)))= ^(7,(0,1)) 



,(x,y)&V, for video 



frames and 



SL(^ = iS(Tix,y))) = 



^^(7(0,0)) cy(/,(l,0)) 
^(7(0,1)) 



V 



,(x,y)^^foT images. 



SjjiI,) = i^(J,(x,y))) = 



^^(l,ix',y')) S(I,{x' + l,y')) 
S(Z(x',y'=l)) 



,{x,y)GV,Jor video 



15 Sh(^ = S(lix,y))) = 



,(A;,;;)e5Hfor images 



frames, and 

^S(l,ix',y')) S(l,ix' + l,y')) ■ 
^(7(x',y + l)) 

V 

where S(lix,y)) = n, when e„_, < J{x,y) <s„,andn = 0, 1, 2, ... 5l (5h) represents 
the low (high) frequency subband - so does VtL(VtH). Notice here that I{x,y) is the 
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single channel of multi-channel intemity function. After finding F (the LFC and 
HFC signatures) which is the discriminate signature of J. which is also a feature 
extracted from?, algorithms that utilize this feature can be used for video 
processing applications as described further below. 
5 Referring back to HG. 5, a cut transition detector 130 that is connected to the 

LFC generator 120 identifies the cut transidons in a video sequence. A fade detector 
134 Ota. is connected to the HFC generator 124 identifies starting and eodpoints of the 
Me transitions it, the video sequence. A dissolve segments collector 138 is connected 
to the HFC generator 124 and identifies potential startiug and ending points of the 

10 dissolve transitions of the video sequence. 

A dissolve transition verifier 142 confirms the existence of the potential 
dissolve trattsitions identified by Are dissolve segmetUs coUeetor 138 using a double 
fiame differencing (DFD) algorithm. A segmented data generator 142 coUects the cut 
transition data, the fade uansition data, and the dissolve tuition data identified by the 
15 cut detector 130, the fade detector 134. and the dissolve transition verifier 142. The 
segmented data generator 142 transmits the transition data to dte interface 90 for 
storage within the video databases 92, 94, 98. or for transmission to odter computers 
or I/O devices 104. 

6Uj6Za Referring now to FIG. 6, the cut transition detector 130 is connected^ 
20 LFC generator 120 and includes an S^.sX^^:^J^^^^^.-^^^^^^' ^ 

threshold generator 152, and aconjp^^tor^^^ uansition data collector 154 
collects the cut transitiop^fS^tt^ sequence. A smoothir^g filter (not shown) may 
be used on ajvcJ^the S-distance difference generator 150 or the comparator 154 



if de§ii^. 
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Referring to FIG. 7, the operation of the cut detector 130 is Ulustrated in 
further detail. In step 160, a value for the cut threshold is set. In step 162, the S- 
distance function difference is calculated. 

To calculate S-distance, weighting functions are applied to the LFC and the 
5 HFC signatures. Then, the S-distance function difference is calculated for the frames 
of the video sequence. The S^istance difference is calculated for pairs of consecutive 
frames, such as frame t and frame r+i. S-distance measures the distance between two 
images or two consecutive frames of the video sequence by taking the difference 
between LFC and HFC signatures after weighting functions are applied: 

Sit,t . 1) = ^.s,(M 1) + 4Mt,t + 1) = 4,\s,{t i),s,(Oh #«|^h(' ^ 
0 ^ |fi/^'s,0 + 1) - Q,'5,(0l+#Hlf^H"'s„(r+ 1) - ^h'sAOI 

where ,4h ,and a« are weighting functions. 

When identifying cut sequences, the high frequency signature components 
and/or the weighting frmction are set to 0. For any consecutive frames, if 
S(t,t+1)> d where a is the cut threshold, then from frame t to frame t+1 (or 
15 alternatively = no =..1) observes a cut transition, otherwise no cut transition 

is observed. 

In step 164, the S-distance fimction difference for frames t and t+1 is 
compared to the cut threshold. If the S-distance function difference exceeds the cut 
threshold as determined at step 166, a cut transition is declared at step 168. If not, a 
20 cut transition is not declared at step 170. Additional pairs of frames for t+2, t+3. 

r + n are handled similarly . 

Referring now to FIG. 8, the fade transition detector 134 is illustrated in 
further detail. The fade transition detector 134 is comiected to the high frequency 
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component signal generator 124 and includes a summing signal generator 180, a linear 
function generator 184, and a comparing circuit 188. Because the cut transition 
detector 130 has identified cut transitions, the fade detector 134 analyzes only the sub- 
sequences of the video between two consecutive cuts. 
5 The changing characteristics of the video frames within fade and dissolve 

transition can be modeled as: 

E{t) = F{o; )7(0 + no,^)(} - V(0) + C, \fte it^,t^) 
where E(t) is a characteristic function; F (i;/ ), F (i./ ) represent unedited moving 
image sequence characteristic functions of two consecutive shots; ^/fO is a decreasing 
10 function with ^(^o) = 1 and ^(^J = 0; C is the constant (or background) image such 
as text, label or logo which exists in all frames within a shot transition; and to, tN are 
the starting and ending points of a transition. 

During a fade out, the second sequence is absent and F(f^/) = Ofor 

yteCto.t^)- Inafadein, F(f^/) = 0 for V^Gffo'^ivA i-e-' 
15 ^/<,..-<,.(0 = ^(^/Xl-'7(0)+C 

During a dissolve, both Si(t) and 82(1) are not equal to zero. 

^..„/ve(0 = F{u:Mt) + Fio,')(\ - 7(0) + C 
Examples of the changing characteristic functions E(t) include the changing intensity 
20 function I(x,y,t) and the edge intensity function G(x,y,t). 

E(x, y,t) = I, (X, y,t)rj(t) + 1, (x, y.t)a-r!(t)) + IJx, y) 
E(x.y,t ) = G,(x,y.t)rj(t) + G,(x,y,t)(\ - 7j(0) + Gc (x,y) 
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where /. and h define the intensity of the first and second unedited moving image 
sequences; Gr and Ch are the pixel intensity function of the corresponding edge image 
sequences of image sequences /; and L and Gc represent the pixel intensity function 
of the constant image and the constant edge image, respectively. Notice that m the 

5 above equation a(x,y,t)=0 when i^x.y) is not an edge point. Hence only the edge 
points in the edge image will contribute to this characteristic function. 

Based on the number of frames in a shot to be analyzed for a fade transition, 
the values for a decreasing function and a weighting function used for fade transition 
detection are set by the linear function generator 184 in step 190 in FIG. 9. At step 

10 194, the high frequency coefficients for each frame are summed. 

^.(0=l^.(OHS(..)'^(^(-'>^))'(^'^^^^- '''' '''' 
188 generates a difference between the decreasing function output by the linear 
function generator 184 with the sum output by the summing signal generator 180 for 
each frame. If the difference is approximately zero for each frame in the shot 
15 sequence, as determined at step 200, i.e., if SAO-SAToMO -0,te [T,,T,1 then 
a fade-out is declared at step 202. If not, the difference circuit subtracts one minus the 
decreasing function from the sum at step 204. If the difference is approximately equal 
to zero for each frame in the shot as determined at step 208, i.e., if 
^.(0-^«(r.)a-^(0)«0>^-[^o,r.l then a fade-in is declared at step 210. 
20 Otherwise, neither a fade-in transition nor a fade-out transition are declared at step 
212. 

Referring now to FIG. 10, the dissolve transition segments collector 138 is 
illustrated in further detail. The dissolve transition segments collector 138 is comiected 
to the HFC generator 124. The high frequency coefficients for each frame are 

14 



. o 1 c /.M V /i(7 (x v^^ (x v)gK„. An ideal dissolve signal 
summed, 5 ^ (0 H 5„ (0 1= 2- ix,y) , C^f' y))' y)^>'tfi 

generator 224 generates the ideal dissolve fiinction that is a changing statistical 
function. A smoothing filter 228 smoothes the summed HFC. A difference circuit 
229 generates a difference between the output of the ideal dissolve signal generator 224 
5 and the filtered and summed HFC output. If the difference is approximately zero as 
determined at 230, i.e., if SH(t) - Sh(To) ^(t) » 0, t .[To, Tn.] and S„(t) - Sh(To) (1- 
n(t)) « 0, t e[TN.,TN], then To and Tn are declared as potential starting and ending 
points of a dissolve. Experimental results show that the HFC more accurately predict 
fades and dissolves as compared to color histogram, fi-ame differencing, and motion 
10 vector analysis. Generally S„(t) of a dissolve transition is «U"-shaped with the center 
of the "U" being a local minima identifying a mid-point of a potential dissolve 
transition and local maxima on both sides thereof identifying starting and ending points 
of the potential dissolve transition. 

Referring now to FIGS. 10 and 11, the dissolve transition verifier 142 is 
15 mustrated in fiirther detail. The dissolve transition verifier 142 is comiected to the 
dissolve transition segments collector 138 and receives the potential starting and ending 
points of dissolve transitions therefrom. The dissolve transition verifier 142 is 
connected to the LFC signature 120 and includes a double frame difference (DFD) 
generator 250 which is connected to the output of the dissolve transition segments 
20 collector 138. 

An ideal dissolve has a «V"-shaped intensity function and has no local motion 
or camera motion in the sequence. The change of intensity of the first shot has a 
negative slope and is linear. There exists a ftame /*with its intensity I(x,yM equal to 
the average intensity of the starting and ending frames I(x.y,u) and I(x.yM of the 

15 



dissolve when N=2m+1. That is , 

. ^ I(x, y,in) + nx,y,if^) 
I{.x,y,ik) = J • 

(Note when N=2m (m is an integer), k is then a pseudo frame.) The DFD of frame 
of a moving image sequence / is defined as the accumulation of a pixel by pixel 
5 comparison between this average and the intensity of frame i., where is a frame in a 
potential dissolve transition segment. 

I(x,y,i.) + I{x,y,is) _ j^^^ ^ \ 



DFD{i,)=Y.f 



The dissolve transition verifier 142 fiirther includes a smoothing filter 254 
which smoothes the output of the DFD signal. A verified dissolve transition collector 
10 256 stores the verified dissolve transition data for a video sequence. 

Referring to FIG. 11, at step 260, the DFD signal generator 250 computes the 
DFD signal on the LFC signature for the starting and ending points provided by the 
dissolve segments collector 138. At step 264, the smoothing filter 254 filters the data 
provided by the DFD signal generator 250. At step 266, the slope of the DFD signal 
15 is used to identify whether the DFD signal is concave (i.e. if the DFD(t) - 
DFD(To)Ti(t) « 0, t e [To, Tn.], and DFD(t)- DFD(TK)(1-Ti(t)) « 0, t e [Tn«, Tn]) and 
whether the depth of the concavity exceeds a threshold. If both are present, a dissolve 
transition is declared at step 208. If one or both are not present, then a dissolve 
transition is not declared at step 269. 

As can be appreciated, the automatic video indexing system 80 automatically 
indexes video sequences with a high probabUity of identification of both abrupt and 
gradual shot transitions. Furthermore, a key frame can be selected from each shot for 
image retrieval and shot summary by selecting a first frame, an intermediate frame, or 
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a combination of frames. 

Referring now to FIG. 12, an image retrieval system 300 is illustrated. 
Reference numbers from FIG. 5 have been utUized to identify similar elements in FIG. 
12. The image retrieval system 300 includes a query image capture device 310 which 
5 employs I/O devices 104. The query image can be input using I/O devices 104 such as 
a scanner for capturing a photograph or sketch. Drawing software associated with 
processor 84 and memory 86 may also be used to input a sketch. Alternately, the 
query image can be selected on the Internet, input using portable storage media, stored 
on a hard drive or selected from the image databases 92, 94, and/or 96. Other suitable 
10 query image sources will be apparent to skilled artisans. The query image capture 
device 310 is connected to the frequency decomposer 322 which provides frequency 
decomposition of the query image using wavelet transformation, DFT, DCT, FFT, or 
other suitable frequency domain transformation. Preferably, however, wavelet 
decomposition using Haar transformation is employed. 
^^^^~^--^-^II\The ou^ut of the frequency decomposer 322 is connected to the LFC genera^ 
120 and the HFC generator 124. An image retrieving device 320rglrievgsimages for 
comparison to the query image from at leastone,Df*r1mage databases 92, 94, and/or 
98. The image retrieving dejoeiT^^Ooutputs the images to the frequency decomposer 
322 which smai^Iy performs wavelet transformation, DCT, DFT, FFT, or other 
20 smtaCle frequency domain transformation. 

The output of the frequency decomposer 116 is input to the LFC generator 120 
and the HFC generator 124. The output of the LFC generator 314 and 324 are input 
to a LFC weighting device 330. The output of the HFC generators 316 and 326 is 
input to a HFC weighting device 340. After suitable weighting is applied, an S- 
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distance generator 342 generates the S-distance measurement. 

The S-distance measurement performed on frames t and t+1 can be used on 
the query image and database image. S(t,t+1) is replaced by Sr<SJ^) where Q 
represents the query image and Sn represents the nth image in the database. 

s(aJ,)=<^L(aJi)+ ^Sh(6>j,)=^\Sl(Q),Sl0,)\+ ^SH(ii^.SH(j,)\ 

= ^ 1 QL''SL((^-C2L^nSL(J:) | + ^ | ^Sh((9)-€Ih^Sh(S) \ 

The images with the least S-distance measurement to the query image <2 can then be 
returned in order of highest to lowest simUarity as retrieval results in a manner similar 
to text-based browsing and searching. 



As can be appreciated, the query image is compared to multiple images froj 
the image databases 92, 94, and/or 98 and the S-distance me^useiaeflraefines the 
relative similarity between the query imagg^aadtEe^atabase image. Subsequently, the 
processor 84 and memopi-Se'^ranges the. query results in order of highest to lowest 
similarity apd-(5u^ts the query results to one of the I/O devices 104 for selection by 
15 the^t^r. 

Referring now to FIG. 13, a second embodiment of the image retrieval system 

is illustrated at 350. A query image capture device 352 captures a query image as 

described above. An image retriever 354 retrieves images for comparison with the 

query image. Depending on how the image is stored, the output of the image 

20 retrieving device 352 is input to a frequency decomposer 356, to a LFC signal 

generator 358 and a HFC signal generator 360, to a LFC and HFC weightmg device 

364 as indicated by dotted line 365. Processing of the S-distance measurement is 

similar to that described above with respect to FIG. 12. By eliminating some of the 

processing on the database images, computational efficiency can be improved. 

25 From the foregoing, it will be understood that the invention provides an 

18 



image retrieval system that generates a list of possible database images that match a 
query image based upon the similarity between the database image and the query 
image. Skilled artisans can appreciate that while discrete functional blocks have 
been identified in FIG. 5, 6, and 8, these functions can be combined into larger 

5 functional blocks which perform multiple functions. The image retrieval system 
allows large databases to be searched for images. Browsing and searching extensive 
image databases is dramatically simplified. 

While the invention has been described in its presently preferred embodiments, 
it will be understood that the invention is capable of certain modifications and changes 

10 without departing from the spirit of the invention as set forth in the appended claims. 
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